Research on Text Similarity Measurement Hybrid Algorithm with Term Semantic Information and TF-IDF Method

نویسندگان

چکیده

TF-IDF (term frequency-inverse document frequency) is one of the traditional text similarity calculation methods based on statistics. Because does not consider semantic information words, it cannot accurately reflect between texts, and enhanced distinguish documents poorly because extended vectors with similar terms aggravate curse dimensionality. Aiming at this problem, paper advances a hybrid understanding to calculate texts. Based term weighting tree (TSWT) data structure definition from HowNet, firstly discusses preprocess filter process then utilizes those key similarities according weight features whose greater than given threshold. The experimental results show that method better pure aspect accuracy, recall, F1-metric by different K-means clustering methods.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Term Weighting: Novel Fuzzy Logic based Method Vs. Classical TF-IDF Method for Web Information Extraction

Solving Term Weighting problem is one of the most important tasks for Information Retrieval and Information Extraction. Tipically, the TF-IDF method have been widely used for determining the weight of a term. In this paper, we propose a novel alternative fuzzy logic based method. The main advantage for the proposed method is the obtention of better results, especially in terms of extracting not...

متن کامل

The DF-ICF Algorithm- Modified TF-IDF

The tf-idf is an algorithm which is generally used where massive data processing is done. Tf-idf is the weight given to a particular term within a document and it is proportional to the importance of the term. This paper aims to use the idea behind the tf-idf algorithm to design the df-icf algorithm which finds the importance of a particular document within the given corpus. General Terms DF-IC...

متن کامل

Semantic Search Engine using Joomla Framework with Modified tf-idf and TRApriori Algorithm

As the amount of data available in a repository increases, content retrieval from the huge data stored in the repository becomes a tedious task. Though Content Management System helps us to manage the data, yet searching the relevant data is still a daunting task. For that, we need efficient Search Algorithms for maximizing the correlation between data required and data returned by semantic sea...

متن کامل

Recommended model of information flow based on TF-IDF

This article constructed personalized recommendation models in online social streams based on ties strength, topic relevance and trust dimensions. The experiments on the Sina blogs data showed that the proposed method could reduce the ranks of irrelevant tweets effectively and achieve better performance than several baseline methods based on cosine and hash tags

متن کامل

Discriminative Features Selection in Text Mining Using TF - IDF Scheme

This paper describes technique for discriminative features selection in Text mining. 'Text mining’ is the discovery of new, previously unknown information, by computer. Discriminative features are the most important keywords or terms inside document collection which describe the informative news included in the document collection. Generated keyword set are used to discover Association Rules am...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Advances in multimedia

سال: 2022

ISSN: ['1687-5680', '1687-5699']

DOI: https://doi.org/10.1155/2022/7923262